Goto

Collaborating Authors

 pairwise potential



Review for NeurIPS paper: Belief Propagation Neural Networks

Neural Information Processing Systems

Weaknesses: The absence of any explaining figures makes it hard for the reader to fully understand the underlying architecture of BPNNs. Specifically, it is not clear what the actual target of a BPNN acting on a particular factor graph is! Is it the partition function? Moreover, the intention of the Bethe Free Energy Layer remains obscure. In lines 121-125 you mention: 'When convergence to a fixed point is unnecessary, we can increase the flexibility [...] Additionally we define a Bethe free energy layer using two MLPs that take the trajectories of learned beliefs from each factor and variable as input and output scalars.'


Probabilistic Multimodal Depth Estimation Based on Camera-LiDAR Sensor Fusion

Obando-Ceron, Johan S., Romero-Cano, Victor, Monteiro, Sildomar

arXiv.org Artificial Intelligence

Multi-modal depth estimation is one of the key challenges for endowing autonomous machines with robust robotic perception capabilities. There have been outstanding advances in the development of uni-modal depth estimation techniques based on either monocular cameras, because of their rich resolution, or LiDAR sensors, due to the precise geometric data they provide. However, each of these suffers from some inherent drawbacks, such as high sensitivity to changes in illumination conditions in the case of cameras and limited resolution for the LiDARs. Sensor fusion can be used to combine the merits and compensate for the downsides of these two kinds of sensors. Nevertheless, current fusion methods work at a high level. They process the sensor data streams independently and combine the high-level estimates obtained for each sensor. In this paper, we tackle the problem at a low level, fusing the raw sensor streams, thus obtaining depth estimates which are both dense and precise, and can be used as a unified multi-modal data source for higher level estimation problems. This work proposes a Conditional Random Field model with multiple geometry and appearance potentials. It seamlessly represents the problem of estimating dense depth maps from camera and LiDAR data. The model can be optimized efficiently using the Conjugate Gradient Squared algorithm. The proposed method was evaluated and compared with the state-of-the-art using the commonly used KITTI benchmark dataset.


JFP: Joint Future Prediction with Interactive Multi-Agent Modeling for Autonomous Driving

Luo, Wenjie, Park, Cheolho, Cornman, Andre, Sapp, Benjamin, Anguelov, Dragomir

arXiv.org Artificial Intelligence

We propose JFP, a Joint Future Prediction model that can learn to generate accurate and consistent multi-agent future trajectories. For this task, many different methods have been proposed to capture social interactions in the encoding part of the model, however, considerably less focus has been placed on representing interactions in the decoder and output stages. As a result, the predicted trajectories are not necessarily consistent with each other, and often result in unrealistic trajectory overlaps. In contrast, we propose an end-to-end trainable model that learns directly the interaction between pairs of agents in a structured, graphical model formulation in order to generate consistent future trajectories. It sets new state-of-the-art results on Waymo Open Motion Dataset (WOMD) for the interactive setting. We also investigate a more complex multi-agent setting for both WOMD and a larger internal dataset, where our approach improves significantly on the trajectory overlap metrics while obtaining on-par or better performance on single-agent trajectory metrics.


Modeling Label Correlations for Ultra-Fine Entity Typing with Neural Pairwise Conditional Random Field

Jiang, Chengyue, Jiang, Yong, Wu, Weiqi, Xie, Pengjun, Tu, Kewei

arXiv.org Artificial Intelligence

Ultra-fine entity typing (UFET) aims to predict a wide range of type phrases that correctly describe the categories of a given entity mention in a sentence. Most recent works infer each entity type independently, ignoring the correlations between types, e.g., when an entity is inferred as a president, it should also be a politician and a leader. To this end, we use an undirected graphical model called pairwise conditional random field (PCRF) to formulate the UFET problem, in which the type variables are not only unarily influenced by the input but also pairwisely relate to all the other type variables. We use various modern backbones for entity typing to compute unary potentials, and derive pairwise potentials from type phrase representations that both capture prior semantic information and facilitate accelerated inference. We use mean-field variational inference for efficient type inference on very large type sets and unfold it as a neural network module to enable end-to-end training. Experiments on UFET show that the Neural-PCRF consistently outperforms its backbones with little cost and results in a competitive performance against cross-encoder based SOTA while being thousands of times faster. We also find Neural- PCRF effective on a widely used fine-grained entity typing dataset with a smaller type set. We pack Neural-PCRF as a network module that can be plugged onto multi-label type classifiers with ease and release it in https://github.com/modelscope/adaseq/tree/master/examples/NPCRF.


Discriminative training of conditional random fields with probably submodular constraints

Berman, Maxim, Blaschko, Matthew B.

arXiv.org Machine Learning

Problems of segmentation, denoising, registration and 3D reconstruction are often addressed with the graph cut algorithm. However, solving an unconstrained graph cut problem is NP-hard. For tractable optimization, pairwise potentials have to fulfill the submodularity inequality. In our learning paradigm, pairwise potentials are created as the dot product of a learned vector w with positive feature vectors. In order to constrain such a model to remain tractable, previous approaches have enforced the weight vector to be positive for pairwise potentials in which the labels differ, and set pairwise potentials to zero in the case that the label remains the same. Such constraints are sufficient to guarantee that the resulting pairwise potentials satisfy the submodularity inequality. However, we show that such an approach unnecessarily restricts the capacity of the learned models. Guaranteeing submodularity for all possible inputs, no matter how improbable, reduces inference error to effectively zero, but increases model error. In contrast, we relax the requirement of guaranteed submodularity to solutions that are probably approximately submodular. We show that the conceptually simple strategy of enforcing submodularity on the training examples guarantees with low sample complexity that test images will also yield submodular pairwise potentials. Results are presented in the binary and muticlass settings, showing substantial improvement from the resulting increased model capacity.


Learning from weakly dependent data under Dobrushin's condition

Dagan, Yuval, Daskalakis, Constantinos, Dikkala, Nishanth, Jayanti, Siddhartha

arXiv.org Machine Learning

Statistical learning theory has largely focused on learning and generalization given independent and identically distributed (i.i.d.) samples. Motivated by applications involving time-series data, there has been a growing literature on learning and generalization in settings where data is sampled from an ergodic process. This work has also developed complexity measures, which appropriately extend the notion of Rademacher complexity to bound the generalization error and learning rates of hypothesis classes in this setting. Rather than time-series data, our work is motivated by settings where data is sampled on a network or a spatial domain, and thus do not fit well within the framework of prior work. We provide learning and generalization bounds for data that are complexly dependent, yet their distribution satisfies the standard Dobrushin's condition. Indeed, we show that the standard complexity measures of Gaussian and Rademacher complexities and VC dimension are sufficient measures of complexity for the purposes of bounding the generalization error and learning rates of hypothesis classes in our setting. Moreover, our generalization bounds only degrade by constant factors compared to their i.i.d. analogs, and our learnability bounds degrade by log factors in the size of the training set.


Scaling Matters in Deep Structured-Prediction Models

Shevchenko, Aleksandr, Osokin, Anton

arXiv.org Machine Learning

Deep structured-prediction energy-based models combine the expressive power of learned representations and the ability of embedding knowledge about the task at hand into the system. A common way to learn parameters of such models consists in a multistage procedure where different combinations of components are trained at different stages. The joint end-to-end training of the whole system is then done as the last fine-tuning stage. This multistage approach is time-consuming and cumbersome as it requires multiple runs until convergence and multiple rounds of hyperparameter tuning. From this point of view, it is beneficial to start the joint training procedure from the beginning. However, such approaches often unexpectedly fail and deliver results worse than the multistage ones. In this paper, we hypothesize that one reason for joint training of deep energy-based models to fail is the incorrect relative normalization of different components in the energy function. We propose online and offline scaling algorithms that fix the joint training and demonstrate their efficacy on three different tasks.


Self-Guided Belief Propagation -- A Homotopy Continuation Method

Knoll, Christian, Kulmer, Florian, Pernkopf, Franz

arXiv.org Machine Learning

We propose self-guided belief propagation (SBP) that modifies belief propagation (BP) by incorporating the pairwise potentials only gradually. This homotopy continuation method converges to a unique solution and increases the accuracy without increasing the computational burden. We apply SBP to grid graphs, complete graphs, and random graphs with random Ising potentials and show that: (i) SBP is superior in terms of accuracy whenever BP converges, and (ii) SBP obtains a unique, stable, and accurate solution whenever BP does not converge. We further provide a formal analysis to demonstrate that SBP obtains the global optimum of the Bethe approximation for attractive models with unidirectional fields.


Graph Refinement based Tree Extraction using Mean-Field Networks and Graph Neural Networks

Selvan, Raghavendra, Kipf, Thomas, Welling, Max, Pedersen, Jesper H, Petersen, Jens, de Bruijne, Marleen

arXiv.org Machine Learning

Graph refinement, or the task of obtaining subgraphs of interest from over-complete graphs, can have many varied applications. In this work, we extract tree structures from image data by, first deriving a graph-based representation of the volumetric data and then, posing tree extraction as a graph refinement task. We present two methods to perform graph refinement. First, we use mean-field approximation (MFA) to approximate the posterior density over the subgraphs from which the optimal subgraph of interest can be estimated. Mean field networks (MFNs) are used for inference based on the interpretation that iterations of MFA can be seen as feed-forward operations in a neural network. This allows us to learn the model parameters using gradient descent. Second, we present a supervised learning approach using graph neural networks (GNNs) which can be seen as generalisations of MFNs. Subgraphs are obtained by jointly training a GNN based encoder-decoder pair, wherein the encoder learns useful edge embeddings from which the edge probabilities are predicted using a simple decoder. We discuss connections between the two classes of methods and compare them for the task of extracting airways from 3D, low-dose, chest CT data. We show that both the MFN and GNN models show significant improvement when compared to a baseline method, that is similar to a top performing method in the EXACT'09 Challenge, in detecting more branches.